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CLASSIFICATION OF INTERMITTENT DEPENDENT OBSERVATIONS 



by 

P. A. Jacobs and D. P. Gaver 



1. THE PROBLEM 

Consider the following classification problem. Suppose there are J items 
(e.g., diseases) each of which has a characteristic Signature which varies in 
time; the Signature of Item i is 

Y/(f) = 0 / + X/(f) i = l,2,...,/ f = 0,1,2,.... (1.1) 

For the moment {X,(f)J is an unspecified multivariate (or univariate) 
stochastic process, but one that stays near 0j in finite time and has some 
stationary or steady-state behavior. In many cases, paths of X,(f) will appear 
somewhat "continuous," so successive X,(f)'s are not well-modeled as iid 
random variables. One could think of Y,(f) as physical indices characteristic 
of a particular disease, e.g., blood pressure, heart-beat pattern, cholesterol 
levels. Examples from equipment reliability are also of interest; here physical 
indices might be vibration, variations in heat level, oil leakage, and even fuel 
consumption in the case of engines. 

In many circumstances Y,(f) is only observable occasionally, at times 
unrelated to the value of Y,(f) but driven by other forces such as the 
scheduling of a routine physical exam or system inspection. Suppose that the 
Signature and the identity of the item associated with the Signature are both 
observed at time t = 0, on such an occasion. Suppose that, later on, however, 
only the Signature of an item is observed. The first question is: What is the 
probability that, given the Signature value observed, its originating item is 
any particular one of the J candidates? 

In Gaver and Jacobs [1989], the processes (Xj(t)} are assumed to be 
univariate Gaussian and a Bayesian classification procedure is studied. In this 



1 



paper. Section 2 assumes {Xj(t)} are multivariate normal autoregressive 
processes. In Section 3, {K;(t)}is a univariate Cauchy autoregressive process 
whose marginal distribution has longer tails than the Gaussian. A Bayesian 
classification procedure for the Cauchy data is studied. In Section 4, we study 
the behavior of the univariate Cauchy and Gaussian classification procedures 
when autoregressive data having the wrong marginal distribution are 
presented to them. The results suggest that the Gaussian classification 
procedure is biased towards classifying a Signature produced at time t as being 
associated with the same item that produced the Signature at time 0. The 
Cauchy classification procedure is biased towards classifying a Signature 
produced at time t as being associated with a different item than the one 
producing the Signature at time 0. These effects are strongest for small times 
t. The largest number of misclassifications occur for small times t when the 
Gaussian classification procedure is presented with Cauchy data and a 
different item is associated with the Signature at time t than the item 
associated with the Signature at time 0; in this situation the Gaussian 
procedure is relatively less sensitive to the change in the item associated with 
the Signature. Misclassifications by the Cauchy classification procedure are 
modest in comparison to this extreme case. 

In summary, it is important to realize that the performance of a Bayesian 
classification procedure can be influenced by its underlying distributional 
assumptions. A classification procedure based on Gaussian distributional 
assumptions can be reluctant to classify a new observation coming from a 
different item as being associated with a new item. A classification procedure 
based on Cauchy distributional assumptions can be reluctant to classify a new 
observation which comes from the same item as that being associated with 
the same item. Hence, if there is uncertainty about the underlying 
distribution of the data, it might be better to combine results of several 
classification procedures based on different distributional assumptions. 
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2. THE MULTIVARIATE NORMAL CASE 



2.1 The Classification Question 

Assume for illustration that the Signature of Item j is multivariate AR(1): 

Y;W = «/+X;(/) (2.1) 

where 0y, Yy(f), and Xy(f) are d-dimensional column vectors. The process 
|Xy(f)J is a vector AR(1) process 

X j (t) = A j X j (t-l) + E j (t) (2.2) 

where Ay is a dxd matrix and |Ey(t)Jis a sequence of d-dimensional column 

vectors which are independent multivariate normal with mean 0 and 
variance-covariance matrix Ay. The variance-covariance matrix for Xy(f + 1) 

is 

Ty(f + 1) = E[x(f + l)X T (f + l)] = Ayry(f)AT + Ay. (2.3) 

We will assume A y and Ay are such that there is a finite unique solution to 
the equation 

r,' = AfjAf + Ay. (2.4) 

Assume Xy(0) has a normal distribution with mean 0 and variance- 

covariance matrix Ty. It follows that {Xy(f)j is a stationary sequence with 
mean 0 and variance-covariance matrix Ty. 

The conditional distribution of Xy(t) given Xy(0) = x is multivariate 
normal with mean Ayx and variance-covariance matrix 
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A /(f) = XA"A,-(a;) T . (2.5) 

n=0 

Thus, r, = lim A At). 

' t^>°° 1 

The conditional distribution of the actually observable Yy(f) given 
Yj(0)=y(0) is multivariate normal with mean 0j+A j(y(O)-0j) and variance- 

covariance matrix Aj(t). 

Operational Scenario: There are, potentially, J items. Let C(t) be the identity 
of the item whose Signature is observed at time t. Put pj(t) = P{C(t)=j}. 
Assume that it is known that the Signature observed at time 0 comes from 

Item i; that is, C(0) = i and Y(0) = Yi(0) = y(0). If it has been a long time since a 

Signature from item i has been observed, it is reasonable to suppose that 

P{C(0) = i,Y(0) = y(0)} 

= Pi<O)((2^) d |r,|) _O5 exp{-i(y(O)-e i )' r r- 1 (y(O)-0 I )} (2.6) 

the long-run or steady-state distribution. Further, 

P{C(f) = i, Y(/) = y(f)|C(0) = i,Y(0) = y(0)} 

= exp{-i(y(l) - midf U‘T\y«) - ■», •(/))} (2.7) 

where 

m l (f) = 0»+A|(y(°)-0,) (2.8) 

and 

£,(()= A ,.((). (2.9) 
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For j * i, v/e will assume the conditional distribution of Y(t) given C(0) = i, 
Y(0) = y(0), C(t) = j is multivariate normal with mean mj(t) = 0j and variance- 
covariance matrix Ej(t), since it is still a long time since a Signature from 
Item j is observed. 

It now follows that 

P{Y(0 € rfy(0|C(0) = i, Y(0) = y(0)} 




Thus, the posterior probability of the identity of the item associated with 
the Signature is 

P{C(I) = >|C(0) = i, Y(0) = y(0), Y(() = y(l)J = 

P;(<)| £ ;( , )| 15 e>:p|-i(y(f) - my(t)) T (yfO - m;(0)] 



X 



XtftCOMOl 05 ex p|“^(y(0 _ m ^(f)) T 1 (y(f)-m it (f))| 



. ( 2 . 11 ) 



2.2 The Probability of an Incorrect Classification 

In this section we assume that the item that is associated with the 
Signature at time t given the last complete observation at time 0 will be 
estimated to be that one which maximizes the posterior probability (2.11). 

For a simple illustration we will suppose that there are only J=2 possible 
items with known parameters 0i and 02- 

Given Y(0)=y(0), C(0)=1, and C(t) = 1, the conditional distribution of Y(t) is 
multivariate normal with mean 
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mi(O = 0i + Aj(y(O)-0 1 ) 



(2.12) 



and variance-covariance matrix 

T 

AiAf. (2.13) 

k=0 

Let the matrices H;(t) and H; be such that 

H,(l)H i (l) T = £,•(() (2.14) 

and 

J 1 

H,H, T = r,-. (2.15) 

It follows that 

Y(/)-m 1 (f)+H 1 (f)U (2.16) 

where U is a d-dimensional column vector each of whose components are 
independent standard normal random variables; the notation = means equal 
in distribution. Thus, given Y(0) = y(0), C(0) = 1, C(t) = 1, 

( Y(l) - m , (f)) T S, (I)'’ ( Y(f ) - m, (I))! U T U (2.17) 

and 

(Y«)-m 2 (f)) 7 22(<r'(Y( , )-n' 2 (0) 

-( m lW- m 2W+lll(<)U) T £ 2 (<r , ( in l(')- m 2 W + Hl(<)U) (2-18) 

where m 2 (t) = 02 and S 2 (t) = T 2 . Thus, the probability of a misclassification is 
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Pjclassify the item as 2|C(0i = 1, Y(0) = y(O),C(0 = l} 

= p{p2(<)|22(()r 0 ' 5 exp{-i(in 1 (l)-m 2 (f) + H 1 (f)U) 7 '£ 2 (ir , (in 1 (()-m 2 (()+H 1 (l)U 



= P 



P2(t) ( \*M 

Pl(Ol l P 2| 




>M(<)|^(<)|‘ 0 ' 5 exp{-iu T uJj 

> expj-i U T U + i(a(f) + H,(f)U) T r 2 1 (a(f) + H,(f)U)} (2.19) 



where 



a(i) = e, + Ai( y (o)-e 1 )-e 2 . (2.20) 

Example: Assume A, = A, A,- = A for i = 1, 2, and pi(t) = p 2 (t); then 
Pjwrong classification |C(0) = 1, Y(0) = y(0),C(f) = lj 



= P\ 



r |A(()|Y2 



l r l 



\ I I / 



> exp|-i U T U + i(a(f) + lI(i)U) T r-‘ (a(f ) + H(l)U)| ■ (2.21) 



where 



A(0=5>*a( A ‘) - H(f)lI(l) T ; 
A:=0 



T 

r = lim A(f) = 1111 is the solution to the equation 
t — »°o 

r= ArA r + A; 

and 

a(t) = e,+A , (y(0)-e ] )-9 2 . 

Note that as t— > oo 
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P{ wrong classification |C(0) = 1, Y(0) = y(0),C(f) = l| 

-4 p ji > ex P |-^u T u + ^( 0 i - e 2 + HU) T r _1 (o 1 - e 2 + hu)JJ 
= pji > exp|i(e, - e 2 ) T (HH T )' 1 (e 1 - e 2 ) + (e, - e 2 ) T (n 7 ')“ 1 u|l 
= pj-|(ei -e 2 ) T (HH r )" 1 (e 1 -e 2 )> (e, -e 2 ) T (H T )' , ul (2,22) 



A Simulation 

Table 1 gives the results of a simulation experiment for the case 0i = (1,1) T 



0.1 0.9 

0.4 0.5 



and 




0 

1 ' 



In this case. 



5.4 


3.8 ' 


3.8 


4.5 



Figure 1 shows contours from a bivariate normal distribution having mean 
0] and variance-covariance matrix T. 

In each replication two independent vector random variables are 
generated; one is Y(0) which has a normal distribution with mean 0] and 
variance-covariance matrix T; the other is U, whose components are two 
independent standard normal random variables. For each time t = 1, 2, ..., 40, 
Y(t) is calculated as 
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Y(t) = m(t) + H(t)U. 



(2.23) 



with m(t) = 0] + A 1 (y(0) - 0i); Y(t) has the same distribution as a Signature 
from Item 1 when the Signature at time 0 is also from Item 1. There are 1000 
replications. Table 1 presents the fraction of replications for which the 
incorrect classification is made of Item 2 being the one producing the 
Signature at time t; that is those replications for which 






^2 



>exp|i(y(<)-°2) 7 Y 1 (y( , )-02)-^(y(<)-8l) T A(/) ’(y(')-«1 )J- 



(2.24) 



Note that the fractions are not independent since common random numbers 
are used. 

The contours of the distribution in Figure 1 suggest that it is more likely 
to make a misclassification if 02 = (2,2) T than if 02 = (-2,2) T ; the fractions in 
Table A support this. The fractions in Table A also suggest that the probability 
of misclassification is an increasing function of t. This observation is 
supported by the fact that the variances of the components of Y(t) increase as t 
increases. 



TABLE A. FRACTION OF MISCLASSIFICATION 

0! = (1,1) T 



Time: 


1 


2 


3 


4 


5 


10 


20 


30 


40 


02 = (2,2)T 


0.10 


0.13 


0.16 


0.20 


0.21 


0.30 


0.39 


0.41 


0.41 


0 2 = (-2,2)T 


0.04 


0.06 


0.06 


0.07 


0.07 


0.09 


0.09 


0.09 


0.09 
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3. CAUCHY UNIVARIATE MODEL 



In this section we consider Bayesian classification for a time series model 
having marginal distributions with a longer tail than the Gaussian 
distribution. 

We assume that 

Yi(t) = 0i + Xj(t) 

with 

Xj(t) = PiXi(t-l) + Ei(t) 

where | pi | <1 ; {e;(t)} are independent sequences of independent identically 
distributed Cauchy random variables with location parameter 0 and 
precisions [(l - |Pi|)cq ^ and X;(0) has a Cauchy distribution with parameters 
0 and a i ° 5 . Under these assumptions (Xi(t); t = 0, 1, 2, ...} is a stationary 
sequence of random variables with marginal Cauchy distribution having 

„ j -°- 5 

parameters 0 and oq . 

It follows that 

P{Yi(0)e dy{0),Yi(t) edy,-(0} 



= +(y(°)- fl/) 2 ] (“i^-IPil 4 )) +(y(0- ^-P,-(y(°)- 



- 1-1 



(3.1) 

Let C(t) denote the identity of the item associated with the Signature at 
time t and put P{C(t)=i)=pj(t); then 



P[Y(t) € dy(t),C(0 = i|C(0) = i,Y( 0) = y(0)} 



= Pi - 1 Pit ) («i{ 1 - 1 Pit )] + (y(0 - °i - p\ (y(°) - d ijf 



-,-1 
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= Pi (0^ «i(0[«i(0 2 + (y(0 - m i(t)) 2 



(3.2) 




where a.j(t) and mj(t) are defined in (3.2) and it is natural to define 
ntj(t) = 6j and aj(t) = (Xj for j * i. Hence, given item i is associated with the 

Signature at time 0, the posterior probability that item j is associated with the 
Signature observed at time t is 




3.2 The Probability of Making an Incorrect Classification 

In this section we assume that the item associated with the Signature at 
time t given the last complete observation at time 0 is estimated to be that one 
which maximizes the posterior probability (3.3). For simplicity we will 
suppose there are J=2 possible items with known parameters 0i and 02- 




(3.3) 



First 



P{Y(0 s rfy(f)|Y(0) = y(0),C(0) = l,C(f) = /} 




(3.4) 



where 
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(3.5) 

(3.6) 



«l(0 = «l[ 1 “|Plf}«2(0 = «2 
m (t) = d\ +p\{y(o)-0\);m 2 (t) = e 2 . 



Note that given Y(0) = y(0), C(0) = 1 and C(t) = 1, 

n<)-[»i +pi(y(°) - «i)] + (l -hl'JaiW = ra,(!)+ a,(<)W 



where W is a Cauchy random variable with location parameter 0 and 
precision 1. Hence, the probability of making the incorrect classification of 
estimating Item 2 as being associated with the Signature at time t given Item 1 
is responsible for Signatures at time 0 and time t and Y(0) = y(0) is 



PjClassify as Item 2|C(0) = 1, Y(0) = y(0),C(t) = l} 

= p {^«2(0[«2(0 2 + (Y(0- ^ 2 ) 2 ] -1 > «l(t)[«l(0 2 +(Y(0 - wi(t)) 2 ] _1 |C(0) = l,y(0) = y(0),C(t) = 
= P {^^(fj[ a 2 + ("l(0 + - 0 2 ) 2 ] -1 > [«l(0 2 + (ai(t)W) 2 ] _1 |C(0) = 1,Y(0) = y(0),C(f) = 

p ~(t) r«i +(w 1 (o+«i(o^v-6^) 2 ] 

P2V) ^ L_Z J 



pi(0 



«2(0«l(0 > 



«2 + ( 



1 + w z 



■|C(0) = 1,Y(0) = y(0),C(f) = 1 



(3.7) 

Note that as t— >0, ai(t)->0, and mi(t)— >y(0). Hence, the conditional probability 
of a wrong classification tends to 





«2 + (y(°)" e 2) 2 ]] 


y yj 


1 + W 2 | 



(3.8) 



As t— >°°, oci(t)— >ai, mi(t)— >0] and the conditional probability of a wrong 
classification tends to 
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(3.9) 



P j £ 73«2«1 (l + W 2 ) > [a| + (o, W + 0, - e2) 2 ]|. 



PlH 

If a 2 = aj = a and p 2 (°°) = pi(°°), then as t-»°° 
pjincorrect classification) Y(0) = y(0),C(0) = l,C(f) = l} 



= pja 2 (l + W 2 )>« : 



1 + 



W + 



(6 l -6 2 ) 



a 



2 VI 



= P< 



< w 2 > 




2" 

► 




a 





= P\W> 



0 \- 0 2 



2a 



which increases as a increases and decreases as 1 0] - 02 1 increases. 



(3.10) 
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4. ARE BAYESIAN CLASSIFICATION PROCEDURES ROBUST? 

In this section the robustness of the univariate Cauchy and Gaussian 
classification procedures against misspecification of the form of the marginal 
distribution will be studied. 

4.1 Gaussian Data. 

In this subsection we assume that the Signatures of the Items form 
Gaussian time series. In particular we assume that 

Yj(t) = 0j + Xi(t) (4.1) 

with 

Xi(t+1) = piXj(t) + £j(t) (4.2) 

where {£;(t)} are independent identically distributed normal random variables 
with mean 0 and variance o^ and | p* | <1. The independent random variable 
Xj(O) has a normal distribution with mean 0 and variance 
a ; (°°) 2 = o? / (l-p ( 2 ). Thus {Xj(t),t>0} is a stationary sequence of normal 
random variables with mean 0 and variance G,(«>) 2 . Let C(t) be the identity of 
the Item associated with the Signature at time t. 

As was shown in Gaver and Jacobs (1989), the conditional distribution of 
Y(t) given Y(0)=y(0), C(0)-i, C(t)=i is normal with mean 

rni(t) = e i + (y(0)-d i )p t i (4.3) 

and standard deviation 

aj (t) = cr ; H^l-p ; 2f . (4.4) 
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For simplicity we will assume P{C(t) = i} = pj(t) = p(t) and there are 2 Items 
with parameters 0 ] and 62 ; thus, p(t) = ^ 

Suppose the Cauchy procedure is used to estimate the identity of the Item 
associated with the Signature at time t; that is, the Item which maximizes the 
posterior probability (3.3) is the estimate of the Item associated with the 
Signature. Hence, the probability of an incorrect classification is 

PjClassify as Item 2|C(0) = 1, Y(0) = y(0),C(f) = l} 



= P {f^[ a 2(0 2 (™l(0 + <*\ (0 Z ~ e 2) 2 ] > [«i(0 2 + (-i(Oz) 2 



= P 



«2(0 a i(0[«2(0 2 +(m(t)+ 02> 2 ] 



1+ M0 zy 

«i(0 



-1 



= p 



«2ai (l “ |pl f )[«2(0 2 + (™1 (0 + (0 Z - e 2 f 



i-l 



1 + 



giHyj-Pi 



,2f 






(i-hf 



(4.5) 



where Z is a standard normal random variable. 
Note that as t — >0 

PjClassify as Item 2|C(0) = 1,Y(0) = y(0),C(f) = l} 



= P 



«2«1 f 1 - h t) a l+ (m (0 + (0 Z - 02 ) 2 ] 



r’> 


1 - 4 - 


0]H 


2 1 - n 2t 

1 P\ v2 


-1 


j * 


1 T 


. «i . 


(1-N') 2 
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= P< a 2 ai|l-|pi| f ) a2+(mi(t)+ai(t)Z-d 2 f 



-l 



1 + 



giH 

«i . 



iM 

i-ipif 



-l 



= p 



r 2 

«l«2(l-|Pl|‘)[«2 + ( m l( f ) +£r l( f ) Z_0 2) 2 j ( 1 -hl')+ ( 1+ |Plf) Z 



1— 1 1 



= P |[ a 2 + ( mi ( f ) + o 'l( f ) Z - 0 2) 2 ]<«l«2 ( 1 ~|/>l| f )+ ( 1 + |Plf) 









«2+(y(o)-p2) 2 



g i(°°) 

«i 



2Z^ 



-J- 



«1 



[2a 2 ^l( oo^ 



[a|+(y(0)-P 2 ) 2 



<Z^ 



(4.6) 



Thus, the conditional probability of an incorrect classification does not tend to 
0 as t-»°° as it would if the correct model were used; see Gaver and Jacobs 
(1989) (3.6). 

Note that as t— 

PjClassify as Item 2|C(0) = 1,Y(0) = y(0),C(f) = l} 



= P 



a 2 a \ 



a 2 + (0j + oj(°°)Z- 0 2 Y 



-i-1 



1 + 



<*\ 

«1 j 






(4.7) 



If ai = a 2 = 1, then the above equals 



l + (<r 1 (°o)z + e A -d 2 ) > l + (cqHZ) 



-l 



-l 
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= p{(<Ji(~)Z) 2 >(cti(oo)Z+ 0! -0 2 ) 2 } 



= P- 



Z 2 > 



z + 



(fr-fr) 

oiH 



\2i 



= p 



|gi-p 2 | l 

2^(00) } 



(4.8) 



which is the same as if the correct model had been used to make the decision; 
see (3.9) of Gaver and Jacobs (1989). 

Now we consider the case in which a different Item is associated with the 
Signature at time t than the one associated with the Signature at time 0. Once 
again for simplicity we assume G] = a 2 = a, pi = p 2 = p with | p | < 1 and for the 
Cauchy model oti =012 = a. Let a(~) = a/ V 1 -p 2 and a(t) = a(°°W l-p 2t . We 
will assume Item 1 is associated with the Signature at time 0 and Item 2 is 
associated with the Signature at time t. 

For the Gaussian classification procedure of Gaver and Jacobs (1989), the 
probability of an incorrect classification is 
Pfclassify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} 

=P{(l-p2t)" 1/2 exp{_^ [e 2 + a ( co )Z-(0]+ (y(0)-e 1 )p t )] 2 /a(co) 2 (l- p 2 t)) 

>exp{-^ ( 02 +a(~ )Z-0 2 ) /aM )) 

=P{-| In (l-p2t)-|[0 2 + a(~ )Z-(0]+ (y(O)-0 1 )pt)] 2 /{o(«) 2 (l-p2t)} 

>-\ (a(~ )Z) /o(oo) } 

=P{ (l-p 2 ‘)o(<») In (l-p 2t ) +[0 2 + a(<» )Z-(0i+ (y(0>— 0i)pO] 

<(l-p 2 ‘) (a(<»)Z) 2 } (4.9) 
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where Z is a standard normal random variable. As t— >0 



P{classifyasl|Y(0) = y(0) / C(0) = l / C(t) = 2}^p|fz + ^£P =0; (4.10) 




that is, if the Item associated with the Signature at time t is different than the 
one associated with the Signature at time 0, then as t— >0, the probability of an 
incorrect classification using the Gaussian procedure on Gaussian data tends 
to 0. 

As t-»°o, the probability of an incorrect classification. 



Suppose now the Cauchy classification procedure is used on the Gaussian 
data with Item 2 associated with the Signature at time t and Item 1 associated 
with the Signature at time 0. The probability of an incorrect classification 
Pfclassify as 1 | Y(0) = y(0), C(0) = 1, C(t) - 2) 



Pfclassify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} 




(4.11) 



=P{a(l- 1 p | t )[[cx(l— | p | ‘)]2+[02+o(-)Z-[e 1+ (y(O)-0i)pt]] 2 ] >a[a2+(02+o(oo)Z-0 2 ) 2 ] ) 



P{(1- 1 p 1 0) [a 2 + (o(oo)Z) 2 ]>[(a(l- 1 p | l )) 2 + [0 2 + c(~ )Z - [6, + (y(O)-0i)p‘]] 2 ]}. (4.12) 



As t->0 



Pfclassify as 1 I Y(0) = y(0), C(0) = 1, C(t) = 2) 



2 



— >P{O>[0 2 + o(oo)Z - y(0)] ) = 0; 



(4.13) 
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that is, as t— >0, the probability of a correct classification for the Cauchy 
procedure tends to 1 for the case in which the Item associated with the 
Signature at time t is different from the one associated with the Signature at 
time 0, even though the data are Gaussian. 

As t — » oo 

Pfclassify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} 



I Z 2 > Z + 



02-ei 

0 (°°) 



I 02-01 I ] 

= p | z < -23 w1' 



(4.14) 



Hence, as t— the probability of an incorrect identification tends to the same 
normal tail probability for both the Cauchy and Gaussian classification 
procedures. 

Thus, for the two limiting cases t-40 and t->°°, both the Cauchy and 
Gaussian procedures have the same misclassification probabilities for the 
scenario in which the Item associated with the Signature at time t is different 
than the one associated with the Signature at time 0. Note that these are 
theoretical limiting results with all parameters known. 

To investigate further the behavior of the two classification procedures 
on Gaussian data when Item 1 is associated with the Signature at time 0 and 
Item 2 is associated with the Signature at time t, let 



g t (y(0),Z) = (0 2 +o(oo)Z-[0 1 +(y(O)-0i)p‘]) . 



The conditional probability of an incorrect classification by the Gaussian 
procedure is from (4.9) 
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P{Classify as Item l|Y(0) = y(0),C(0) = l,C(f) = 2} 

=P{(l-p 2t )a(°o) 2 ln(l-p 2t ) < (l-p 2t ) (a(oo)Z) 2 -g t (y(0),Z)} 

>P{(l-p 2t )a(°°) 2 ln(l-p 2t ) < (1- 1 p 1 0 (a(oo)Z) 2 -g t (y(0),Z)) 

>P{-a 2 (l- 1 p I *) I p I ‘ < (1- 1 p 1 0 (a(oo)Z) 2 -g t (y(0) / Z)} 
for t sufficiently close to 0. From (4.12) it follows that the conditional 
probability of misclassification for the Cauchy procedure is 

P{(1- 1 p I ‘)(a(~)Z) 2 -g t (y(0),Z) > a 2 (l- |p|‘) [l-|p| ‘-I]}. 

Hence for t sufficiently small, the incorrect Cauchy procedure will tend to 
have fewer misclassifications than the Gaussian procedure applied to 
Gaussian data in the scenario in which different Items are producing the 
Signatures at time 0 and t. 

4.2 Cauchy Data 

In this subsection we assume the Signatures form time series with 
Cauchy marginal distributions as in Section 3. In particular, we assume that 

Yj(t) = 0; + X;(t) (4.15) 

with 

Xj(t+1) = piXi(t) + Ej(t) (4.16) 

where {e,(t)) are independent identically distributed Cauchy random variables 

with location 0 and precision [(1— | p ; | Jot)] -0 - 5 with | p; | <1. The independent 

random variable Xj(0) has a Cauchy distribution with location 0 and precision 
- 1/2 

a ; . Under these assumptions (Xj(t)} is a stationary sequence of random 

- 1/2 

variables with marginal Cauchy distribution having parameters 0 and a j 

Further, the conditional distribution of Y(t) given Y(0) = y(0), C(0) = i, C(t) = i 
is Cauchy with location parameter 
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(4.17) 



mi(t) = 0i + p‘(y(0) - 0.) 



and precision parameter a;(t) -1 / 2 with 

oti(t) = cti(l— | pi | l ). (4.18) 

Let C(t) be the identity of the Item associated with the Signature at time t. 

For simplicity we will assume there are two items with parameters 0i and 
02- Further P{C(t) = i} = pj(t) = p(t). 

Suppose the Gaussian procedure of Gaver and Jacobs (1989) is used to 
estimate the identity of the item associated with the Signature at time t; that 
is, the item which maximizes (2.12) of Gaver and Jacobs (1989) is the estimate 
of the Item associated with the Signature at time t. Hence, the probability of 
an incorrect classification is 

PjClassify as Item 2|C(0) = 1,Y(0) = y(0),C(f) = l} 



= Pi 



(T 2 H 



exp- 



' (no- fr) 

a 2 (°°) 



n2’ 



> > 



0i(O 



exp< ■ 



no-” 1 i(o 

(0 



n2 



C(0) = 1, 
Y(0) = y(°), 

C(0 = 1 



(4.19) 



where 



ni](t)= 9i+p\(y{0)-di) 


(4.20) 


0i(O = 0i 


(4.21) 


H = 0i / ^(i - p?); 02 H = 02 / ^(i - p\ ) 


(4,22) 
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and Gj, i = 1, 2 are the assumed standard deviations of the normal 
distributions. We will assume Oi = 02 = a; pi = p2 = P; = « 2 - Hence, 



P{Classify as Item 2|C(0) = 1,7(0) = y(0),C(t) = l} 

m 1 (0 + n (QW-e 2 t 2 | 1 

g(oo) 1 [ a ( t ) 



f 1 


r 

if 


\°H exp ' 


— 


2 l 



exp 



ir«(t)w A 



(4.23) 



where W is a Cauchy random variable with location parameter 0 and 
precision 1. 

PjClassify as Item 2|C(0) = 1,7(0) = y(0),C(f) = l} 

x2 



= P 






p > exp 



i (^-H'1) 
2 (i -p 2t )°H 



, 2 i (p* ( y(°) - #i ) + (#i - 02 ) + «(0w) 
2 w + 2 



G(oo)^ 



= P 






p > exp 



1 “IHpI'I , 2 1 (p'(y(0) - 01 ) + (01 - 02) + a(t)wf 

2(l + |p|')aH 2lV + 2 



°(°°Y 



(4.24) 



f-»0 



^P<! 



0 > exp 



1 (y(o)-02)' 

2 g(°o) 2 



= 0; 



thus, the probability of misclassification tends to zero as t— >0 even though the 
incorrect model is being used; the correct Cauchy procedure also has a 
probability of misclassification tending to zero as t— >0. As t— »°o 



P{Classify as Item 2|C(0) = 1,7(0) = y(0),C(t) = l} 




1 > exp 



1 0' 2 W 2 

2 g(°°)^ 



1 [0i - 0 2 + «wl 2 ] 

-- o — ~ r *■ 

2 g(oo) 
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= P-j 1 > exp 



1 1 



a(°°)‘ 



-[(9 1 -e 2 ) 2 +2(9 ] -e 2 )aiv] 



p|o>I(e,-e 2 ) 2 +(e 1 + e 2 )aw} 

■ p { w <4>-M 



which is the same as (3.10) the corresponding probability when the correct 
Cauchy procedure is used. 

To further explore the behavior as t— »0, let 

gt(W,y(0» = [p‘(y(0)-9,) + (0,-e 2 ) + a(l-| p low] 2 

and 

B(t) = a2(l-|p|0W 2 . 



For t small (4.24) becomes 

PjClassify as Item 2|Y(0) = y(0),C(0) = l,C(f) = l} 

=P{a(oo)2(l+ | p 1 1) ln(l-p2t) + B(t) > (1+ | p | ‘) g t (W /y (0))) 
<P{-a 2 1 p | t+B(t) > (1+ 1 p | t )gt(W,y(0))} 

<P(-a 2 1 p | *+B(t) > g t (W,y(0))) 

=P{a 2 (l- 1 p | ‘) +B(t) > a 2 +g t (W /y (0))} 



=P{(a(l- 1 p 1 0) 2 + (1- 1 p I ‘) B(t) > (1- 1 p | ‘) [a 2 +g t (W,y(0))]} 
=P{a[a 2 +g t (W,y(0))] > a(l-| p | l ) [[a(l-| p 1 0] + (a(l- 1 p I ‘)W) ] ) 
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which is the conditional probability of misclassification for the Cauchy 
procedure on Cauchy data. Hence for small t, the incorrect Gaussian 
procedure will tend to have fewer misclassifications than the correct Cauchy 
procedure for the scenario in which the same item is associated with the 
Signature at both times. 

Now we consider the case in which the Item associated with the 
Signature at time t is different than the one associated with the Signature at 

time 0. Once again for simplicity we assume ai = a 2 = a, pi = P 2 = p with 

11 , 05 

|p |<1 and for the Gaussian model = Cj = (J. Let a(°°) = a/(l-p 2 ) and 

0.5 

o(t) = a(~) (l-p 2t ) . We will assume Item 1 is associated with the Signature 

at time 0 and Item 2 is associated with the Signature at time t. 

For the Gaussian classification procedure of Gaver and Jacobs (1989) the 
probability of an incorrect classification is 
Plclassify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} 

=P((l-p 2t )"°' 5 exp(-^ [0 2 + aW - (0, + p‘(y(O)-0i))] 2 /o(~ Hl-p*)} 

1 2 2 
>exp{-^(02+aW-02) /o(oo) }} 

=P{-^ In (l-p 2t ) - ^ [02 + aW - (0] + p‘(y(O)-0i))] /{a(°° ) (l-p 2t )} 

>-| (aW) /a(°°) ) 

=P[a(oo ) 2 (1— p 2t ) In (1 — p 2t ) +(0 2 + aW-(0i + pHy(O)-0i))) 2 (4.26) 

<(aW) (l-p 2t )} 

where W is a standard Cauchy random variable. 

As t— >0, the probability of an incorrect classification 
Pfclassify as 1 | Y(0) = y(0), C(0) = 1, C(t) = 2) 
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— >P{(0 2 +aW-y(O)) 2 < 0} = 0. 



Hence as t— >0, the probability of an incorrect classification tends to 0 for the 
Gaussian procedure on Cauchy data. 

As t— >«, the probability of an incorrect decision 

2 2 

Plclassify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2}-»P{(0 2 + aW-Q^ <(aW) } 



=p{(w +^ 1 ) 2 < w 2 } = P{W> 



a 



I 62 - 0 ! I 

2a 1 



Suppose now the Cauchy classification procedure is used on the Cauchy 
data with Item 2 associated with the Signature at time t and Item 1 associated 
with the Signature at time 0. The probability of an incorrect classification is 
Plclassify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} 

=P(a(l- 1 p | ‘)[a 2 (l- |p|‘) 2 + [02+aW-(0i+p t (y(O)-0i))] 2 ] 1 



>a[a 2 + (02 +aW-0 2 ) ] } 



=P((1- 1 p I ‘)[a 2 + (aW) ] 



>[a 2 (l- 1 p I ‘) 2 + [0 2 +aW-(0,+pt(y(O)-0 1 ))] 2 ]}. (4.27) 

As t — >0, the probability of an incorrect classification 

Plclassify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} P{O>(0 2 +aW - y(0)) 2 ) = 0. 

Thus, the probability of an incorrect identification using the Cauchy 
procedure tends to 0 as t— >0 for the case in which the Item associated with the 
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Signature at time t is different from the Item associated with the Signature at 
time 0. 

As t— the probability of an incorrect identification 

P{classify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} 



2 (02— 9l) 2 1 02-01 I 

^P(w > (W + *-^) } = P{W > — 1 } 

the same as for the Gaussian procedure. 

Hence, for the two limiting cases t— >0 and t-»°° both the Cauchy and 
Gaussian procedures have the same misclassification probabilities for the case 
in which the Item associated with the Signature at time t is different than the 
one associated with the Signature at time 0. Note these are theoretical 
limiting results with all parameters known. 

To further explore the differences between the Gaussian and Cauchy 
procedures for the scenario of different Items associated with Signatures and 
Cauchy data, let 



gt(W,y(0)) = [aW + 0 2 -0i-p'(y(O)-0i)] . 



(4.28) 



From (4.26) for the Gaussian procedure, the probability of an incorrect 
classification 

P{classify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} 



=P{c(oo) 2 (l- p 2t) i n (l_p2t) < (aW) 2 ( 1 — p 2t) _ g t (W,y(0))). 

For the Cauchy procedure, the probability of an incorrect classification 
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P{classify as 1 1 Y(0) = y(0), C(0) = 1, C(t) = 2} 



= P{a 2 (l- 1 p I *)[1- 1 p | M] < (1-| p | *) (aW) 2 - g t (W,y(0))} 

= P{a 2 (l- 1 p I ') I p 1 1 < (1- 1 p 1 0 (aW) 2 - gt(W,y(0))} 

< P{a(oo) (l-p 2t ) In (1-p 2 *) < (1- 1 p I *) (aW) - gt(W,y(0))} 

<P{o(oo) 2 (l-p 2 ‘) ln(l-p 2t ) < (l-p 2 ‘) (aW) 2 - g t (W /y (0))} 

for t sufficiently small. Thus, for small t the Gaussian procedure will tend to 

have more incorrect classifications than the Cauchy procedure for the 

scenario of Cauchy data with the Item associated with the Signature at time t 

being different than the one associated with the signature at time 0. This 

effect is made stronger by the fact that if the Gaussian procedure is used then 

2 2 
an estimate of a(°° ) will be needed. An estimate of o(«> ) for Cauchy data 

will tend to be very large since the Cauchy distribution does not have a finite 

variance. This effect will be seen in the simulations of the next subsection. 

4.3 Results of simulation experiments 

This subsection reports on results of simulation experiments to assess the 
behavior of the Gaussian and Cauchy classification procedures when they are 
confronted with data from the other distribution. For simplicity we assume 
there are two Items. In the first subsection the autoregressive process 
producing the data is Gaussian. In the second subsection the autoregressive 
process producing the data is Cauchy. In both subsections classification 
procedures using both the Cauchy and Gaussian distributional assumptions 
are assessed. In all cases pi = P 2 = 0.5, 0] = 1, 02 = 2. The simulations use the 
LLRANDOM random number generator; cf. Lewis and Uribe [1981]. 
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a. Gaussian Data 

The simulation in this subsection uses data from a Gaussian 
autoregressive process. We will assume the means of the two Signatures, 0j 
and 02, are known and Pi = P 2 = P is also known. It remains to assess values 
for the (presumed known) scale parameters of the two classification 
procedures. In particular what should the scale parameter a = ai = ct 2 of the 
Cauchy procedure be when it is applied to Gaussian data? To obtain 
reasonable values for cr 1 = 02 = c for the Gaussian classification procedure and 
a = ai = 0 C 2 for the Cauchy classification procedure, the following simulation 
experiment was performed. The experiment has 100 replications. In each 
replication 100 independent, standard normals are generated. For each 
replication, the standard deviation of the data is computed and the maximum 
likelihood estimate of a is obtained numerically assuming a Cauchy density 
function of the form 

tt v 1 a 

The medians of the 100 estimates of a and the 100 standard deviations are 
calculated. The values obtained are 6^ =1.0 = 0.607. Note that the 

estimates of a are using the incorrect model assumption of Cauchy for the 
Gaussian data. The value of is used in the Gaussian procedure to classify 
observations. The value of a M is used in the Cauchy procedure. 

Tables 1 and 2 show results for simulation experiments with 500 
replications. In each replication Y(0) is generated from a normal distribution 
with mean 0j and standard deviation o(~) = o/sjl-p 2 with a=l and p = 0.5. 
For Table 1 Y(t) is generated from a normal distribution with mean 
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m(t) = 01 + p‘(Y(0)-e,) 



and standard deviation a(t) = l-p‘; namely the Signature observed at 

time t is from Item 1. For Table 2 Y(t) is generated from a normal distribution 
with mean 02 and standard deviation o(°°); namely the Signature at time t is 
from Item 2. 

In both Tables the Gaussian classification procedure assumes = 1.0 is 
the correct standard deviation. The Cauchy classification procedure assumes 
aM = 0.607 is the correct value for a. 

The values in Table 1 suggest that when the same Item is producing the 
Signature at time 0 and t, then the Gaussian procedure produces more correct 
classifications for small time t. However, the number of correct classifications 
is the same for both procedures for larger t. 

The values of Table 2 suggest that if a different Item is producing the 
Signature at time t, then the Cauchy classification procedure has more correct 
classifications at time t for small t even though the data are Gaussian. For 
larger t, both procedures have the same number of correct identifications. 

b. Cauchy Data 

In this subsection the data arise from a Cauchy autoregressive process. 
The mean Signatures of the two Items, 0i and 02, are assumed known and p = 
pi = p 2 is also assumed known. It remains to assess values for the scale 
parameters of the Gaussian and Cauchy classification procedures. In 
particular, what should the scale parameter a = Gi = 02 of the Gaussian 
procedure be when it is applied to Cauchy data? 
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TABLE 1. GAUSSIAN DATA 



Item 1 Produces the Signature at time 0 
Item 1 Produces the Signature at time t 



Time 


Fraction Correct Identifications 


Number of times 


t 


Gaussian Proc. 


Cauchy Proc. 


Gaussian Correct 
/Cauchy 
Incorrect 


Gaussian 

Incorrect/Cauchy 

Correct 


i 


0.77 


0.65 


50 


0 


2 


0.68 


0.67 


5 


0 


5 


0.67 


0.67 


0 


0 


10 


0.70 


0.70 


0 


0 



TABLE 2. GAUSSIAN DATA 



Item 1 Produces the Signature at time 0 
Item 2 Produces the Signature at time t 



Time 


Fraction Correct Identifications 


Number of times 




Gaussian Proc. 


Cauchy Proc. 


Gaussian Correct 
/Cauchy 
Incorrect 


Gaussian 

Incorrect/Cauchy 

Correct 


i 


0.64 


0.71 


0 


38 


2 


0.69 


0.71 


0 


6 


5 


0.64 


0.64 


1 


0 


10 


0.68 


0.68 


0 


0 



To obtain reasonable values for a for the Gaussian classification 
procedure and a for the Cauchy classification procedure, the following 
simulation experiment was performed. The experiment has 100 replications. 
Each replication generates 100 standard Cauchy random numbers. For each 
replication the standard deviation of the data is computed and the maximum 
likelihood estimate of a is obtained numerically. The medians of the 100 
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estimates of a and the 100 standard deviations are computed. The values 
obtained are 



= 13-23 and «m = 103. 

Note the high value of the standard deviation. 

Tables 3 and 4 present results of simulation experiments in which the 
data are from a Cauchy autoregressive process. All experiments have 500 
independent replications. For each replication Y(0) is generated from a 
Cauchy distribution with location parameter 0i, and scale parameter 1; that is. 
Item 1 is producing the Signature at time 0. For replications reported in Table 
4, Y(t) is generated from a Cauchy distribution with location parameter 02 and 
scale parameter 1; that is. Item 2 is producing the Signature at time t. For 
replications reported in Table 3, Y(t) is generated from a Cauchy distribution 
having density function 

a i 1 2^1 

nx; " k a(t) 2 + (x-m(t)) 2 

with 

m(t) = 0i + p'(y(O)-0i) 

and 

a(t) = (1-1 p | *); 

that is. Item 1 is also producing the Signature at time t. 

In both Tables 3 and 4, the Gaussian classification procedure assumes a 
standard deviations Oi =02 =Om- The Cauchy classification procedure 
assumes the a-parameters ai = ct 2 = . 

The results of Table 3 indicate that for small times t, if the same Item is 
producing the Signature at time 0 and time t, then the Gaussian classification 
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procedure has more correct classifications even though the data are Cauchy. 
For larger times t, the number of correct classifications is the same for both 
procedures. On the other hand, the results of Table 4 indicate that if a 
different item is producing the Signature at time t, then the Cauchy 
classification procedure has many more correct classifications than the 
Gaussian procedure for small times t. Once again the number of correct 
identifications is the same for both procedures as t becomes larger. 



TABLE 3. CAUCHY DATA 



Item 1 Produces the Signature at time 0 
Item 1 Produces the Signature at time t 



Time 


Fraction Correct Identifications 


Number of times 




Gaussian Proc. 


Cauchy Proc. 


Gaussian Correct 
/Cauchy 
Incorrect 


Gaussian 

Incorrect/Cauchy 

Correct 


i 


0.98 


0.73 


124 


0 


5 


0.72 


0.69 


18 


0 


10 


0.66 


0.66 


0 


1 



TABLE 4. CAUCHY DATA 



Item 1 Produces the Signature at time 0 
Item 2 Produces the Signature at time t 



Time 


Fraction Correct Identifications 


Number of times 




Normal Proc. 


Cauchy Proc. 


Normal Correct 
/Cauchy 
Incorrect 


Normal 

Incorrect/Cauchy 

Correct 


i 


0.09 


0.74 


0 


316 


2 


0.14 


0.70 


0 


282 


5 


0.67 


0.61 


0 


29 


10 


0.64 


0.64 


0 


0 
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c. Summary 

The differences in performance of the two classification procedures 
appear for small time t. If the same Item is producing Signatures at both 0 
and t, then the Gaussian classification procedure has more correct 
classifications for small times t for both Gaussian and Cauchy data. If a 
different Item is producing the Signature at time t, then the Cauchy 
classification procedure has more correct classifications for both Gaussian and 
Cauchy data. The effect is strongest if the data are from a Cauchy 
autoregressive process; in this case the Gaussian procedure does very poorly 

when different Items are producing the Signatures. 

In summary, it is important to realize that the performance of a Bayesian 
classification procedure can be influenced by its underlying distributional 
assumptions. A classification procedure based on Gaussian distributional 
assumptions can be reluctant to classify a new observation coming from a 
different item as being associated with a new item. A classification procedure 
based on Cauchy distributional assumptions can be reluctant to classify a new 
observation which comes from the same item as that being associated with 
the same item. Hence, if there is uncertainty about the underlying 
distribution of the data, it might be better to combine results of several 
classification procedures based on different distributional assumptions. 
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